feat: add Qwen3-Omni Thinker GSPO support#6238
feat: add Qwen3-Omni Thinker GSPO support#6238qinganrice wants to merge 4 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for the Qwen3-Omni model architecture and enhances FSDP and LoRA handling. Key changes include registering the Qwen3-Omni Thinker as a causal language model with custom forward and embedding logic, implementing a module stripping mechanism to reduce memory usage during FSDP initialization, and adding a new reward scoring utility (gsm8k_thinker) designed for models that output reasoning steps. Additionally, the PR updates LoRA parameter collection to support diffusers and adds a fallback mechanism for parameter summoning. Review feedback highlights the need to narrow broad architecture mappings to prevent conflicts with encoder-decoder models, improve exception handling during model registration, refine regex patterns in the reward scorer to handle currency symbols, and remove debug print statements from production code.
Summary
AutoModelForCausalLMwith forward redirect to Thinker, fixtie_word_embeddingsand_no_split_modulesfor FSDP compatibilitymin_num_params > 0to avoid nested FSDP allgather divergenceget_peft_modelso FSDP can flatten mixed-dtype unitsfrom_pretrainedvia_verl_strip_moduleslayered_summonwith fallback to full summon when layered returns emptytext_configfallback in monkey_patch for models without top-levelnum_attention_headsLoRARequestgsm8k_thinkerreward with</think>extraction and\boxed{}supportvllm_omni/vllm_omni_arin rollout and replica registries for verl-omni integrationTest plan